Measuring 3D Video Quality of Experience (QoE) Using A Hybrid Metric Based on Spatial Resolution and Depth Cues

A three-dimensional (3D) video is a special video representation with an artificial stereoscopic vision effect that increases the depth perception of the viewers. The quality of a 3D video is generally measured based on the similarity to stereoscopic vision obtained with the human vision system (HVS). The reason for the usage of these high-cost and time-consuming subjective tests is due to the lack of an objective video Quality of Experience (QoE) evaluation method that models the HVS. In this paper, we propose a hybrid 3D-video QoE evaluation method based on spatial resolution associated with depth cues (i.e., motion information, blurriness, retinal-image size, and convergence). The proposed method successfully models the HVS by considering the 3D video parameters that directly affect depth perception, which is the most important element of stereoscopic vision. Experimental results show that the measurement of the 3D-video QoE by the proposed hybrid method outperforms the widely used existing methods. It is also found that the proposed method has a high correlation with the HVS. Consequently, the results suggest that the proposed hybrid method can be conveniently utilized for the 3D-video QoE evaluation, especially in real-time applications.


Introduction
The Quality of Experience (QoE) of a video is a measure that states the satisfaction level from a viewer's perspective.Hence, this measurement is viewer-centric and focuses on measuring the overall satisfaction and acceptability of a video by taking a holistic approach by evaluating all QoE factors that can affect a viewer's appreciation positively and/or negatively [1].
The QoE is based on real and end-user experiences.Therefore, the QoE is directly affected by objective and subjective parameters.The objective parameters are the parameters that originate from Quality of Service (QoS) factors and depend mostly on network performance, software, and hardware features.On the contrary, the subjective parameters are determined by the influence of the viewers' individual preferences, expectations, previous video experiences, etc.So, the subjective parameters are more difficult to categorize compared to the objective parameters.However, they are more likely to arise from different perception characteristics that people have (e.g., age, eyesight, mobility, perspective, etc.).For this reason, it is indisputable that the measurement of the subjective parameters is more arduous because they are more abstract.In addition, the other challenge is the design of a comprehensive QoE metric.To be able to design a comprehensive QoE metric, a sufficient number of QoE factors is required.These factors are possibly controlled, measured, or simply collected and reported [2].
The video quality perceived by a viewer is considered the most important part of the QoE [3].Three-dimensional videos are special types of video representations that can enable a feeling of being in the same space while viewing them due to the addition of depth perception with the depth cues being forward.It is clear that in addition to the quality, a vital factor affecting the QoE in 3D videos is the perception of depth enabled by the viewer.Therefore, the key to increasing the QoE of 3D videos for a viewer is to enable 3D video representations that will create a plausible depth perception in the viewer.
As is known, depth perception is the main result of the stereoscopic vision process carried out by the HVS [4,5].Except for people with various visual impairments or losses, every person with normal vision has an HVS that combines monocular and binocular depth cues to achieve stereoscopic vision.Although this system works according to the same principles in every human, the perceived depth may be relativistic because of different perception characteristics and visual experiences.In other words, it is possible to have different QoE evaluations about a 3D video when viewed by different viewers due to different credibility perceptions enabled.This is a major obstacle to performing an accurate QoE evaluation for the 3D videos.
Currently, the QoE evaluation for 3D videos can be performed by using two methods [5].One of them relies on the subjective evaluation in which real human observers assess the 3D-video QoE.It is a fact that the subjective quality evaluation is vital for accurately assessing the 3D-video QoE.However, the subjective evaluation is difficult to perform due to being time consuming and costly and its unsuitability for real-time applications [6,7].The other one relies on the objective QoE in which iterative mathematical and statistical metrics are utilized during the evaluation process.The objectivity of these metrics stems from their rational expansions that are accepted by the researchers and enable reliable and objective evaluations on a regular basis.These metrics mostly do not consider the most important characteristics of the HVS for 3D-video perception.Therefore, they generally do not achieve a high correlation with real-human quality evaluations [8].
The subjective and the objective image or video QoE metrics existing in the literature can be categorized as full reference (FR), reduced reference (RR), and no reference (NR) [9][10][11][12].The FR ones cannot be used without the original video, and the RR ones require video features obtained from the original video.Therefore, it is not possible to run the FR and the RR metrics simultaneously with a streaming video.On the other hand, the NR metrics do not require an original video or video features obtained from the original video for the QoE evaluation.It means that they can run simultaneously with a streaming video.However, the FR and the RR metrics can contain more information for the QoE evaluation than the NR metrics.Therefore, the FR, the RR, and the NR metrics have superiorities over each other in terms of the QoE evaluation.Another problem with the subject is that researchers feel obliged to select one of these three approaches when developing metrics.Also, it has been observed that pseudo reference image (PRI) quality-evaluation metrics have been developed in recent years.Contrary to conventional FR, RR, and NR metrics, the PRI metrics use a new type of reference.In conventional metrics, the reference is the original image, which is assumed to have a perfect quality or some derived characteristics of the original image.However, in the PRI metrics, the reference, which is called the pseudo reference image, is generated from the distorted image by further degrading it in several ways and to certain degrees [13,14].With this approach, the PRI metrics have brought a new breath to the image-quality-evaluation field.
On the other hand, image-quality-evaluation approaches need to be developed according to the characteristics of the digital images obtained by different rendering methods.In general, it is possible to classify digital images into three types according to rendering methods: Natural Scene Image (NSI), Computer Graphic Image (CGI), and Screen Content Image (SCI).The NSIs are digital images captured from the real world and may be degraded by physical reasons such as a low-quality lens, being out-of-focus, motion blur, and insufficient and inappropriate lighting conditions and aerial conditions.CGIs are created or animated by using computer software and are widely used in video games, animations, simulators, etc.They may be degraded by rendering artifacts.SCIs are composite images and consist of texts, graphics, icons, etc.Also, they sometimes contain NSI and CGI regions so that they may be degraded by NSI and CGI degraders.Computer-generated SCIs and CGIs have more noise-free smooth areas, high-saturation color content, repeated patterns, and low-or high-frequency contents [15,16].As can be seen, it is significant to measure the quality of images that are likely to be dominated by different defects due to differences in the rendering methods by using the video-quality-evaluation method specific to the image's type.Otherwise, the quality measurements may be inaccurate.
There are also metrics developed to deal with some physical drawbacks that reduce the visual quality and the viewer's depth perception.One of them is the metric that is developed for measuring the light field.What exactly we can see depends on our precise position in the light field.The light field records the total of all light rays in 3D space that flow through every point and in every direction.Therefore, the light field contains very rich information.A light-field image contains many depth cues to make depth estimation possible.The light-field quality metrics measure the light-field qualities of the light-field images [17][18][19][20].
Additionally, in order to increase the viewer's depth perception, developing objective quality-evaluation metrics for dehazed images has been a leading light recently.There are many image-dehazing algorithms to remove the haze from the images captured in hazy conditions and preserve the intrinsic image structures.To assess and compare the image-dehazing algorithms, subjective and objective methods can be used.Since subjective evaluation is a time-consuming process and difficult to apply, objective quality-evaluation metrics are more preferable for the researchers [21,22].
Audio-visual content-quality-evaluation issues have also been researched for decades because visual signals are rarely presented without accompanying audio.The distortions that may separately (or conjointly) afflict the visual and audio signals collectively shape the user-perceived Quality of Experience (QoE) [23][24][25].
Lastly, with the recent rapid developments in the field of virtual reality, developing a 360-degree image (also known as an omnidirectional, panoramic, or virtual-reality image) quality-evaluation metric has been a remarkable research area.Three hundred and sixtydegree images and videos include visual information covering the entire 180 × 360 • viewing spherical.Hence, compared to conventional 2D spaces, there are many challenges to developing a quality metric for immersive multimedia.Especially, ultrahigh or even higher resolution requirements and degradations in 360-degree images/videos are the main two challenges.In the quality-evaluation field of 360-degree images and videos, multichannel convolutional neural networks (CNNs) have been successfully used due to their good performance [26][27][28].
In light of the above explanations, it can be clearly comprehended that the objective QoE metrics, which are frequently used today, are not adequate for the 3D-video QoE evaluation.Hence, there is a need to develop a 3D-video QoE evaluation metric that has a high correlation with the HVS.While developing this metric, a QoE-based approach that examines with the effects of real visual experiences and different perception characteristics of humans on depth perception should be utilized.On the other hand, designing a hybrid 3D-video QoE evaluation combining the superiorities of the FR, the RR, and the NR metrics is a remarkable advantage.The development of a 3D-video QoE evaluation metric with all these properties contributes to the production of more scientific studies on ubiquitous 3D-video technologies.
Considering all of these facts, a hybrid 3D-video QoE evaluation metric relying on the depth cues associated with the spatial-resolution feature of a 3D video, which is quite effective at influencing the depth-perception experience of a 3D viewer, is proposed in this study.These depth cues are determined as the blurriness and motion information extracted from the 2D-texture videos and retinal-image size and convergence extracted from the depth maps (DMs).As the first step of the proposed-metric-development process, prediction models are developed for these depth cues.Due to the nonobjective features of the 3D videos, such as the perceived depth and naturalness, which differ from person to person, subjective tests are applied to evaluate the QoE of the 3D videos.Then, the depth cues and the Mean Absolute Score (MOS) values obtained from the subjective tests are subjected to a correlation analysis to form the proposed hybrid 3D-video QoE evaluation metric.The performance-evaluation results derived by using the proposed metric prove its effectiveness in assessing the 3D-video QoE.
The rest of this paper is organized as follows: Section 2 includes state-of-the-art studies.Section 3 explains the proposed hybrid 3D-video QoE evaluation metric.Section 4 includes the results and the discussions.This paper is ended with the conclusions and future works given in Section 5.

State-of-the-Art Studies
In this section, we provide an overview of the existing studies in the literature in two parts, adhering to the reference-classification approach in Section 1.In the first part, the FR and the RR metrics are presented together, which need to take the original video or some features of the original video as a reference, respectively.The second part includes the NR metrics, which do not need any references for the video-quality-measurement process.Finally, we present an evaluation of the state-of-the-art studies to identify the literature gap.

Reference-Based Metrics
In [29], the use of objective two-dimensional (2D) video-quality metrics for the 3Dvideo-quality assessment (VQA) is discussed, and a perceptual-based objective metric that mimics the HVS is proposed.In this study, the luminance component is taken as an input parameter in the development of the metric.According to the experimental results, it is found that using 2D-and 3D-video-quality evaluations is appropriate since the proposed Perceptual Quality Metric (PQM) mimics the MOS and has greater alignment with it compared to the Video Quality Metric (VQM).To the FR metric in [30], the HVS properties, such as the contrast-sensitive function and luminance masking, are taken into account, and in order to analyze the perceptual similarity of the blocks in the left and right views of the stereoscopic video frames, 3D-DCT transform is used.In [31], a 3D structural-similarity (3D-SSIM) approach is proposed.The proposed algorithm regards a video signal as a 3D volume image and combines a local SSIM-based quality measure with local information content and distortion-based pooling methods.The proposed metric in [32] uses blocking artifacts, blurring in edge regions, and the video-quality difference between two views.The proposed metric in [33] uses the color-video information and the depth information as the input parameters.The color-quality metric (CQM) for 3D videos proposed in [34] takes the luminance coefficient into consideration as it is much more sensitive than the chrominance coefficient of a frame for the HVS.In [35], the proposed metric focuses on the interview correlation of the spatial-temporal structural information extracted from adjacent frames.In [36,37], the proposed FR 3D-video-quality metric is modeled around the HVS, fusing the information of both the left and right channels and considering color components, the cyclopean views of the two videos, and the disparity.Since the metric also considers the screen size, video resolution, and the distance of the viewer from the screen, it is possible to use this metric in different applications.In [38,39], an RR stereoscopic VQA metric is proposed, which comprises spatial neighboring information from the contrast of grey-level co-occurrence matrices for both color and depth and edge properties.
In [40], an FR stereoscopic video-quality-assessment (SVQA) metric based on the Stereo Just-Noticeable Difference (SJND) model that works by using contrast, spatial masking, temporal masking, and binocular masking factors to mimic the HVS is proposed.In [41], an FR stereoscopic VQA metric is proposed by using measurements of structural distortions, blurring artifacts, and content complexity.In the FR metric proposed in [42], human stereoscopic vision is modeled by combining left-eye-view and right-eye-view information through 3D-DCT transformation, and the contrast sensitivity of the HVS is considered as well as the depth information of the scene.The metric proposed in [43] is developed by incorporating the stereoscopic visual-attention (SVA) metric into the stereoscopic video-quality-assessment (SVQA) metric in order to benefit the image-qualityevaluation metrics.The proposed metric in [44], in which the SSIM metric is adapted to stereoscopic videos, is the product of approaches that combine SSIM maps and depth maps with local and global weighting methods.In [45], with the approach that the 3D distortions affecting the 3D video quality should also be taken into account when developing a 3D VQA metric, the proposed metric uses texture distortions (i.e., ghost effects and contour artifacts) and depth distortions as the input parameters.In [46], an FR 3D VQA metric based on the dependencies between motion and its binocular disparities was developed.This metric calculates the spatial, temporal, and depth features and uses them in the ultimate quality calculation.The proposed metric in [47] is used for the quality evaluation of various asymmetrically compressed stereoscopic 3D videos.It is observed that the results obtained from the proposed 2D-to-3D metric are more successful than the results obtained from the direct averaging method.The metric proposed in [48] uses two important phenomena (i.e., binocular suppression and recurrent excitation) to model the HVS better and improve depth perception.The FR 3D-video-quality metric proposed in [49] is based on measuring the directional dependency between the motion and depth sub-band coefficients of stereoscopic 3D videos.The proposed metric in [50] evaluates the quality of 3D videos synthesized with DIBR from three aspects: the quality of unoccluded regions, quality of first-order similarity, and quality of second-order similarity using an energy-based sequence-mapping strategy.Another SSIM-based metric in [51] uses the perceptually significant features, contrast, and motion characteristics that have an impact on the HVS.

NR Metrics
In [52,53], an objective metric (3VQM) is proposed for Depth-Image-Based Rendering (DIBR)-based stereoscopic 3D videos.According to this metric, firstly, the ideal depth map is estimated, which is then used to derive three distortion measures (temporal outliers-TO, temporal inconsistencies-TI, and spatial outliers-SO) to objectify the visual discomfort in the stereoscopic videos.The combination of the three measures constitutes a visionbased quality measure for 3D DIBR-based videos.In the metric proposed in [54], the four factors (temporal variance, disparity variation in the intraframes, disparity variation in the interframes, and disparity distribution in the frame-boundary areas) that affect human perception and visual comfort are examined.In [55], motion and parallax information obtained from depth maps and their histograms are the main parameters of the proposed stereoscopic VQA metric.The results show good performance for video sequences that contain annoying effects for the human eye.
In [56], an NR stereoscopic VQA metric that considers the correlation between the packet loss and perceptual video quality in the network is proposed.The metric yields better results than existing objective metrics so that it can be used in real time when monitoring network statistics.The NR metric proposed in [57], which can be used in the quality measurement of 3D videos that are corrupted or degraded after transmission, uses disparity-index-based dissimilarity measurements and edge-detection-based perceptual-difference measurements.Experimental results demonstrate the effectiveness of the proposed metric.In [58], a stereoscopic VQA metric is proposed to quantify the perceived quality of transmitted and degraded stereoscopic videos.The extracted features are accumulated according to the binocular suppression that is performed by measuring dissimilarity based on the disparity index and perceptual-difference measurement based on edge detection.According to the results, considering the effect of binocular rivalry in a stereoscopic video-quality metric seems to be effective at reflecting the HVS sensitivity and increasing the overall quality.
The proposed NR metric in [59], which examines the effect of the variable network conditions on the 3D-video quality, uses the frame rate, bit rate, and network-packet-loss rate.In [60], the proposed NR metric considers the motion vector lengths and depth information for the 3D-video-quality evaluation.In [61], an NR 3D objective VQA metric that estimates the 3D quality by taking into account the spatial distortions, excessive disparity, depth representation, and temporal information of the video is proposed.The metric is resolution-and frame-rate-independent.To estimate the amount of spatial distortion in the video, the proposed metric computes blockiness.In [62], an extended NR objective 3D VQA metric that can run in real time is proposed.For this purpose, the network-packet loss, video-transmission bit rate, and frame-rate parameters are used as the input parameters.
In [63], a stereo VQA metric by modeling the binocular perception effect in multiviews, including the spatial domain, temporal domain, and the spatial-temporal domain, is proposed.In [5], a depth-perception quality metric is applied to a blind stereoscopic videoquality evaluator to obtain an NR stereoscopic video-quality metric.The proposed NR metric in [64] is based on modeling the joint statistical dependencies between the motion and depth sub-band coefficients.In the proposed metric in [65], the components in the spatial and frequency domains associated with the HVS are used for the 3D VQA.In [66], the proposed NR stereoscopic VQA metric utilizes the 3D saliency map of the sum map first and then uses the sparse representation to decompose the sum map of 3D saliency into coefficients and calculates the features based on sparse coefficients to obtain the effective expression of the videos' message.
The study in [67] introduces a 3D convolutional-neural-network-based SVQA framework that can model not only local spatiotemporal information but also global temporal information with cubic-difference video patches as the input.In [68], a blind NR 3D VQA metric, which is based on the HVS mechanism and natural video statistics of 3D-video characteristics, is proposed.In [69], a stereoscopic VQA metric based on motion perception is proposed.In [70], a comprehensive stereoscopic VQA metric based on the joint contribution of multiple-domain information and a new interframe cross about spatiotemporal information is proposed.
Apart from these studies, the study in [71] examines the added value of using stereo saliency prediction in FR and NR quality-evaluation cases.

Evaluation of the State-of-the-Art Studies
As can be seen from the elucidations above, there are three important limitations regarding the QoE evaluation of 3D videos from the depth-perception perspective.One of them is that it is very difficult to measure the depth cues with current rational methods and scientific approaches in 3D videos.Only a limited number of factors can be considered from a large number of factors affecting the human 3D-video QoE, and these factors are evaluated only within the limits permitted by well-known scientific approaches.Another limitation is that the results obtained from objective 3D-video QoE metrics do not correspond exactly to the 3D-viewing perception of an end user.Therefore, it would not be wrong to state that the most important problem with objective 3D QoE evaluation metrics is the lack of a high correlation with the human depth-viewing perception.The last major problem relies on the fact that the researchers' habit of designing their proposed metrics relies solely on the traditional FR, RR, or NR approaches.
Considering the handicaps elucidated above, the 2D + DM-formed 3D-video QoE evaluation metric proposed in this study is designed by using spatial-resolution-associated depth cues, which have the ability to directly affect the depth perception of the viewer (i.e., the blurriness and motion information measured on the 2D-texture videos and the retinal-image size and convergence measured on the DM sequences).Moreover, while developing the proposed metric with an innovative approach, the NR and the RR types are integrated together to make a hybrid metric.In light of these facts, it could be easily stated that a remarkable hybrid 3D-video QoE evaluation metric, which uses depth cues from two difficult sources and is obtained by getting rid of the routine FR, RR, and NR classification approach that the researchers are stuck in, is developed in the proposed study.

Proposed Hybrid 3D-Video QoE Evaluation Method
In this paper, we propose a hybrid 3D-video QoE evaluation metric that utilizes depth cues associated with spatial resolution (i.e., blurriness and motion information extracted from the 2D-texture videos, retinal-image size, and convergence extracted from the depth maps).
We have a salient reason for our focus on the depth cues associated with spatial resolution in this study.In 3D videos, the depth-perception satisfaction of a viewer is at the forefront.Therefore, the viewer unwittingly encounters many depth cues.Because of having a high depth-cue density, it is a rule of thumb in developing a 3D-video QoE evaluation metric to design it based on the QoE factors that increase the depth perception of the viewer.Since a significant amount of these are closely related to spatial resolution, it is appropriate to start with the spatial resolution.
Spatial resolution can be defined as the number of pixels used for displaying a certain area of a digital image that shows a plane defining a finite volume in unlimited space.In a digital image, the smaller the area a pixel occupies on an object, the more pixels are used to represent that object.Accordingly, as the number of pixels per area (i.e., the spatial resolution) in a digital image increases, it is possible to display more detail [72].
Considering a digital image with different spatial-resolution versions, objects are represented with a greater number of pixels in the higher-spatial-resolution version of this image.Therefore, the pixel-related losses in the objects are less and the lines that highlight the objects appear more.As the objects become apparent, it becomes easier to distinguish them from their background and other objects.Thus, the viewer's depth perception increases.In contrast, in the lower-spatial-resolution version, objects are represented with fewer pixels due to the use of larger pixels.The increase in pixel-related losses in objects results in the loss of detail in the image and a decrease in the viewer's depth perception [72].
On the other hand, depth cues in 2D color videos and associated DM sequences cause the viewer to perceive more or less depth depending on the spatial resolution.As a matter of fact, the HVS obtains better-quality stereoscopic vision by perceiving the monocular and binocular cues that create depth perception more and more comfortably in the version with high spatial resolution.On the contrary, in the version with low spatial resolution, the cues that create depth perception disappear or become unnoticeable enough to the viewer.In this case, it is not possible to obtain a superior-quality stereoscopic view [73].
For the reasons explained above, the spatial resolution of the 3D videos is an important player that directly affects a viewer's depth-perception experience.Therefore, the development of a 3D-video QoE evaluation metric, considering the role of this player in obtaining stereoscopic vision in the HVS, draws the attention of this research study.
The framework of the proposed 3D-video QoE evaluation metric is illustrated in Figure 1.As shown in the framework, we prefer using a 3D-video representation that is the product of the 2D + DM method.The 2D + DM method has become one of the most preferred 3D-video-creation techniques due to its support for coding, transmission, and compression technologies [74].
As can also be seen from Figure 1, due to the usage of 3D videos obtained with the 2D + DM method in this study, the proposed metric has two main elements, with one from the 2D-texture video (M C ) and the other from the DM (M D ).It is clear that these elements have their own effects on the viewer's perception of depth, and each contributes separately to the artificial stereoscopic vision.A change in one of these elements directly causes the viewer's depth perception to change.The reflection of this change in the artificial stereoscopic vision occurs independently of the other element.Therefore, there is an additive relationship between these elements, and this relationship can be illustrated in a metric created based on superposition theory.In light of these explanations, the proposed metric combines these two elements as follows: where M 3D is the proposed metric's expansion.
cial stereoscopic vision occurs independently of the other element.Therefore, there is an additive relationship between these elements, and this relationship can be illustrated in a metric created based on superposition theory.In light of these explanations, the proposed metric combines these two elements as follows: where  is the proposed metric's expansion.The  element provides the effects of two depth cues, blurriness and motion information, in the texture video and the spatial resolution of the 2D-texture video on the depth perception of the viewers.The  component provides the contributions of the two monocular cues in the DM (i.e., the retinal-image size and convergence) and the spatial resolution to the depth perception of the viewers.The  value ranges from 0 to 15.

Proposed Models for the Depth Cues
As we state in Section 3, while we construct the proposed metric, we prefer using the 3D-video representation form, which is the product of the 2D + DM method.The 2D-texture videos are the main components of the 3D videos.While the main QoE factors that create depth perception in the viewer are depth cues hidden in the 2D-texture videos, the helping-component DM sequences have depth-information pixels corresponding to each pixel in the associated 2D-texture video.The quality of the 3D-video viewing experience of the viewers increases significantly with the effective use of these QoE factors or bringing these factors into the foreground.Therefore, it is indisputable that the QoE of the 3D videos that succeed in showing more realistic scenes to the viewer because of being equipped with depth cues is high.The M C element provides the effects of two depth cues, blurriness and motion information, in the texture video and the spatial resolution of the 2D-texture video on the depth perception of the viewers.The M D component provides the contributions of the two monocular cues in the DM (i.e., the retinal-image size and convergence) and the spatial resolution to the depth perception of the viewers.The M 3D value ranges from 0 to 15.

Proposed Models for the Depth Cues
As we state in Section 3, while we construct the proposed metric, we prefer using the 3D-video representation form, which is the product of the 2D + DM method.The 2Dtexture videos are the main components of the 3D videos.While the main QoE factors that create depth perception in the viewer are depth cues hidden in the 2D-texture videos, the helping-component DM sequences have depth-information pixels corresponding to each pixel in the associated 2D-texture video.The quality of the 3D-video viewing experience of the viewers increases significantly with the effective use of these QoE factors or bringing these factors into the foreground.Therefore, it is indisputable that the QoE of the 3D videos that succeed in showing more realistic scenes to the viewer because of being equipped with depth cues is high.
The 2D + DM-formed 3D videos are tailor-made for measuring the depth cues.They allow for measuring the depth cues in the 2D-texture videos and DM sequences separately and provide the possibility to measure depth cues from two separate sources.

Blurriness
The blur defect, which directly affects the video quality of digital images, manifests itself as the reduction in high-frequency components containing edge information in the image.Accordingly, in digital images, the values of the neighbor pixels in the blurred parts of the images converge.
The ambiguity that occurs especially in the edge information of the objects causes the shapes of the objects to not be understood by the viewer or the objects to be indistinguishable from each other or the background.This situation dramatically reduces the perception of depth of the viewer.Therefore, blurriness is an unacceptable flaw in 3D videos that can be associated with the spatial resolution of these videos.
In this study, to scale the blurriness, the total standard deviation of the 2D-texture videos is normalized by the spatial resolution and frame rate as follows: where B is the blurriness, i is the frame number, f is the total number of frames, j is the pixel number, N is the total number of pixels, x j is the pixel value, x is the mean of the pixel values in the frame, F is the frame rate, and S is the spatial resolution.Table 1 presents the blurriness measurements of 2D-texture videos calculated by using Equation ( 2).According to Table 1, the measurements show that the amount of blurriness in versions of a selected 2D-texture video (e.g., Breakdance) with any specific spatial resolution (e.g., SD) and with a gradually increasing compression ratio from QP = 25 to QP = 45 is close.It is also seen that the amount of blurriness in the versions of a selected 2D-texture video (e.g., Ballet) encoded with any specific compression ratio (e.g., QP = 25) and whose spatial resolution changes gradually from SD to QCIF fluctuates.These observations clearly state the correlation between blurriness and spatial resolution.

Motion Information
One of the most remarkable parameters affecting the depth perception of a viewer is the motion information of a 3D video.The motion information is a parameter that depends on the motion density of video frames.
The motion density in a frame is directly proportional to the spatial resolution of the frame.This is because the higher the spatial resolution of the frame, the higher the motion density of the frame.
Optical-flow vectors are used to measure the motion density of the frames.In the calculation of optical-flow vectors, dense or sparse optical-flow algorithms are used.The dense optical flow is based on the global calculation of the amount of displacement of each pixel in an image sequence that occurs between the current frame and the previous frame.Therefore, every pixel that is displaced and not displaced is included in the calculation.The sparse optical flow, on the other hand, is based on the local calculation of the displacement of only displaced pixels in an image sequence between the current frame and the previous frame.
In this study, we use an optical-flow vector calculated by using the Horn and Schunck method, which is a dense optical-flow algorithm, to measure the motion information.
The motion information is calculated by normalizing the average of the total motion density in a video sequence as follows [75]: where M is the motion information, i is the number of frames, f is the total number of frames, Π(i) is the motion density of the i-th video frame, F is the frame rate, and S is the spatial resolution.Π(i) is calculated according to the following equation [75]: where d is a feature point in the frame, n is the number of feature points in the frame, and V d (x i , y i ) is the motion vector of the i-th frame at feature point d.Table 2 shows the motion-information measurements of 2D-texture videos computed by using Equation (3).
In Table 2, it is noticeable that as the compression ratio gradually increases (from QP = 25 to QP = 45) in the SD, CIF, or QCIF spatial-resolution forms of each 3D video, the motion amount gradually decreases.A strong relationship between the motion information and compression ratio can be observed clearly.In addition, as the spatial resolution gradually changes (from QCIF to SD) at any QP value, the motion amount gradually increases.As can be seen, there is another strong relationship between the motion information and spatial resolution.

Retinal-Image Size
According to Emmert's law [76], the distance between an object and its viewer can be calculated by using the actual size of the object and the size of its image on the viewer's retina (see Figure 2).

Retinal-Image Size
According to Emmert's law [76], the distance between an object and its viewer can be calculated by using the actual size of the object and the size of its image on the viewer's retina (see Figure 2).The mathematical expression of this law is given by the following equation: where  is the size of the object,  is the distance of the object to the viewer's eye, and  is the size of the image of the object formed on the retina.Since the  does not change, the  decreases when the  increases and vice versa.In other words, when the object moves away from the viewer and the depth increases, the retinal-image size of the object decreases, and the viewer perceives it as smaller.On the contrary, when the object moves nearer to the viewer and the depth decreases, the retinal-image size of the object increases, and the viewer perceives it as larger.This interesting phenomenon occurs as the change in the pixel values of the DM sequences of 3D videos occurs.While an object moves farther or nearer, the depth-pixel values change between 0 and 255 depending on the depth of the object, and the depth-pixel colors take gray tones.White corresponds to the nearest distance and black corresponds to the farthest distance (see Figure 3).The mathematical expression of this law is given by the following equation: where P is the size of the object, D is the distance of the object to the viewer's eye, and R is the size of the image of the object formed on the retina.Since the P does not change, the R decreases when the D increases and vice versa.In other words, when the object moves away from the viewer and the depth increases, the retinal-image size of the object decreases, and the viewer perceives it as smaller.On the contrary, when the object moves nearer to the viewer and the depth decreases, the retinal-image size of the object increases, and the viewer perceives it as larger.This interesting phenomenon occurs as the change in the pixel values of the DM sequences of 3D videos occurs.While an object moves farther or nearer, the depth-pixel values change between 0 and 255 depending on the depth of the object, and the depth-pixel colors take gray tones.White corresponds to the nearest distance and black corresponds to the farthest distance (see Figure 3).

Retinal-Image Size
According to Emmert's law [76], the distance between an object and its viewer can be calculated by using the actual size of the object and the size of its image on the viewer's retina (see Figure 2).The mathematical expression of this law is given by the following equation: where  is the size of the object,  is the distance of the object to the viewer's eye, and  is the size of the image of the object formed on the retina.Since the  does not change, the  decreases when the  increases and vice versa.In other words, when the object moves away from the viewer and the depth increases, the retinal-image size of the object decreases, and the viewer perceives it as smaller.On the contrary, when the object moves nearer to the viewer and the depth decreases, the retinal-image size of the object increases, and the viewer perceives it as larger.This interesting phenomenon occurs as the change in the pixel values of the DM sequences of 3D videos occurs.While an object moves farther or nearer, the depth-pixel values change between 0 and 255 depending on the depth of the object, and the depth-pixel colors take gray tones.White corresponds to the nearest distance and black corresponds to the farthest distance (see Figure 3).In light of this information, it is proposed to use the change in the depth-pixel values in the DM to compute the retinal-image size in this study.This change can be calculated with the Mean Absolute Deviation (MAD) method for each DM frame as follows: where X i,j is the depth-pixel value at point (i, j); X is the average of the depth-pixel value of the frame of the DM sequence; and m and n are the width and height, respectively.Table 3 presents the retinal-image-size measurements of the DM sequences calculated considering Equation ( 6) and shows that the retinal-image-size measurements in the versions of a selected DM (e.g., Advertisement) with any specific spatial resolution (e.g., SD) and with a gradually increasing compression ratio from QP = 25 to QP = 45 gradually increase or tend to increase.This fluctuation means that there is no significant relationship between the retinal-image size and compression ratio.It is highly considered that this lack of relationship is caused by spatial and temporal distortions due to encoding, compressing, resizing, upsampling, downsampling, or other similar reasons in the DM sequences.Table 3 also shows that the retinal-image-size measurements in the versions of a selected DM (e.g., Butterfly) encoded with any specific compression ratio (e.g., QP = 25) and whose spatial resolution changes gradually from SD to QCIF gradually increase or tend to increase.This proves a strong relationship between the retinal-image size and spatial resolution.

Convergence
The position of objects affects the viewing angle of the eyes.Convergence is seeing an object that is moving closer to the viewer's eyes with a greater angle.Therefore, convergence is a factor that directly increases the depth perception of the viewer.As seen in Figure 4, the viewing angle for an object positioned at d distance from the viewer is calculated as follows: where α is the viewing angle and x is the distance between two human eyes.In the literature, the x distance between two human eyes is adopted as 65 mm [77].

Subjective Tests
Subjective test results, conducted within the framework of standards adopted by major standard bodies, are in fact derived directly from the human vision system.Thus, it becomes possible to consider the relative effect of the depth cues on the viewers by using the subjective test results represented by the MOSs.In this study, subjective tests are conducted to construct a relationship between the MOS values and the proposed metric.After the tests, the 95% confidence intervals [78] are also computed together with the MOS values.
The subjective tests were carried out independently of the metric design and by using 10 different 2D + DM-formed 3D videos (Breakdance, Ballet, Windmill, Newspaper, Interview, Advertisement, Butterfly, Chess, Farm, and Football) in different spatial resolutions  The viewing angles differ for objects located at the same distance from the eye but with different volumes and surface areas.In Figure 4, two objects with different surface areas (S 1 > S 2 ) are positioned at the same distance d from the viewer.Accordingly, the viewing angle for the object with a larger surface area (α 1 ) will be smaller than the viewing angle for the object with a smaller surface area (α 2 ).This is similar for DM sequences with different spatial resolutions.
According to the geometric analysis in Figure 4, if the viewers watch SD-, CIF-, and QCIF-sized DM sequences of a 2D-texture video, they perceive that the distance of the objects does not change, but the objects in the DM sequences are reduced in size and settle in farther locations.This means that DM sequences with a lower spatial resolution are viewed with a larger viewing angle.
In order to obtain convergence in this study, the viewing angles are calculated by using Equation (7) for each frame of the DM sequences, and the total viewing angle is normalized as follows: where C is the convergence, i is the number of frames, f is the total number of frames, S is the spatial resolution, and α is the angle of convergence.Table 4 presents the convergence measurements of DM sequences computed considering Equation ( 8) and shows that the convergence measurements in the versions of a selected DM (e.g., Interview) with any specific spatial resolution (e.g., SD) and with a gradually decreasing compression ratio from QP = 25 to QP = 45 fluctuate.Similar to the retinal-image-size clause, this fluctuation also means that there is no significant relationship between convergence and the compression ratio because of the reasons explained before.Table 4 also shows that the convergence measurements in the versions of a selected DM sequence (e.g., Windmill) encoded with any specific compression ratio (e.g., QP = 25) and whose spatial resolution changes gradually from SD to QCIF gradually increase or tend to increase.So, a strong relationship between convergence and the spatial resolution can be observed.

Subjective Tests
Subjective test results, conducted within the framework of standards adopted by major standard bodies, are in fact derived directly from the human vision system.Thus, it becomes possible to consider the relative effect of the depth cues on the viewers by using the subjective test results represented by the MOSs.In this study, subjective tests are conducted to construct a relationship between the MOS values and the proposed metric.After the tests, the 95% confidence intervals [78] are also computed together with the MOS values.
Before the subjective tests, the participants are sufficiently informed about the features of the test and scoring.The scores given by the observers range from one to five.A five indicates that perception is at the highest level, and a one indicates that it is at the lowest level.The observers are not informed about the order, coding parameters, and features of the test videos.
The observers participating in the tests do not have expertise in 3D videos.The observers participated in the test sitting 3 m away from the autostereoscopic screen.The tests are always carried out in the same test environment.To create the 3D videos, the same sized and encoded DM sequences and 2D-texture videos are used.
During the tests, the Single Stimulus Continuous Quality Evaluation (SSCQE) method is used for quality evaluation.The observers only evaluate the quality and depth perception of the encoded 3D video and the overall 3D-video quality separately without taking a 3D video as a reference.While making this evaluation, the observers benefited from their previous experiences.Inconsistent scores were obtained in all test results based on the ITU-R BT.500-13 standard [78].Thus, the results of 2 of the 23 observers who participated in the test are determined to be inconsistent.The test results of the remaining 21 observers are used to calculate the MOS values.

Modeling of M C and M D 4.1. Modeling of M C
As stated above, the M C element, which represents the 2D-texture video QoE evolution component, combines two depth cues, namely the blurriness and motion information existing in a 2D (i.e., texture) video and the spatial resolution of the 2D video.In order to form a model for this element, the results of the subjective tests are integrated with the M C element to obtain a more-efficient 3D-video-quality metric.During this integration process, the best correlation between the subjective test results and the M C element is taken to determine the mathematical equation of M C .The Pearson correlation method is used for this correlation calculation.The common feature of the depth cues is that they change when the spatial resolution changes.Therefore, a multiplicative relationship between the depth cues and the spatial resolution is considered to be the best reflection of the viewers' depth perception considered in the proposed model.With this approach, the mathematical equation of M C is determined as follows: where B and M are the blurriness and motion-information depth cues, respectively, and S C is the spatial resolution of the 2D-texture video.In addition, the k C constant coefficient in Equation ( 9) is selected as 10 −4 for all 2D videos in order to keep the M 3D values within the specified interval.

Modeling of M D
As discussed above, the M D element, which states the DM-quality-evolution element, provides the contributions of the two monocular cues in the DM (i.e., the retinal-image size and convergence) and the spatial resolution to the depth perception of a viewer.To be able to construct a model for this element, similar to the process conducted for the M C element, the results of the subjective tests are integrated with the M D element that is adopted as the product of the two monocular cues and the spatial resolution of the DM sequences so as to make more contributions to the proposed metric.Similar to the M C model, the common feature of the monocular cues is that they vary when the spatial resolution varies, and a multiplicative relationship between the monocular cues and the spatial resolution is a useful assumption to reflect the viewers' depth perception for the proposed model.
In this sense, the M D element's mathematical model is formulated as follows: where R and C are the retinal-image size and convergence monocular-depth cues, respectively, and S D is the spatial resolution of the DM.Also, the k D constant coefficient in Equation ( 10) is selected as 2.5 × 10 8 for all DMs in order to keep the M 3D values within the specified interval.

Results and Discussions
In this study, the last 150 frames of ten different 2D + DM-formed 3D videos (Breakdance, Ballet, Windmill, Newspaper, Interview, Advertisement, Butterfly, Chess, Farm, and Football) with different spatial resolutions (i.e., SD, CIF, and QCIF) and encoded with 25, 30, 35, 40, and 45 QPs are used to derive results from the proposed metric.The publicly available original versions of these videos were provided by the I-Lab, Center for Vision, Speech, and Signal Processing at the University of Surrey, UK, for research purposes.In order to evaluate the performance of the proposed metric, the MOS values and the quality-evaluation results of widely used 2D-video quality-evaluation metrics, namely the VQM, Peak Signal-to-Noise Ratio (PSNR), and structural-similarity metric (SSIM), are also calculated by using the same 3D videos.All video-quality measurements are set at a precision of four digits after the decimal point.
Tables 5-14 show the quality measurements of the videos used in terms of the MOS, VQM, PSNR, and SSIM results.The confidence-interval values for the MOS results are also presented in the tables.According to the MOS, VQM, PSNR, and SSIM results, it can be clearly observed that as the compression ratio gradually increases (from QP = 25 to QP = 45) in the SD, CIF, or QCIF spatial-resolution forms of each 3D video, the 3D-video QoE by the viewer decreases.This clearly shows the effects of the video spatial resolution and video compression ratio on the 3D-video QoE.The results obtained from the proposed metric bear a resemblance to the MOS results as well as the VQM, PSNR, and SSIM techniques.As can also be observed in the tables, the highest quality measurements calculated by the objective VQM, PSNR, and SSIM methods are obtained from the lowest compression ratio (QP = 25) versions of the SD, CIF, and QCIF spatial-resolution videos.As the compression ratio increases gradually, it is observed that the video quality decreases slightly at each compression level compared to the previous compression level.A similar situation is also observed in the gradual decrease in the MOS measurements obtained from subjective tests.From this point on, we will discuss the M 3D measurements of the proposed metric.Table 5 shows the quality measurements of the video "Breakdance".The M 3D measurement values obtained from the proposed metric are similar to both the objective video-quality measurements and subjective MOS measurements.In other words, as with other video-quality-measurement methods, the highest quality measurements in the proposed metric are obtained from the lowest compression ratio (QP = 25) version of the SD, CIF, and QCIF spatial-resolution videos.As the compression ratio gradually increases, the M 3D measurement decreases.The same situation is observed for the video "Interview" in Table 7.
Table 6 gives the quality measurements of the video "Ballet".The M 3D measurements obtained from the proposed metric generally show similarity to both the objective videoquality measurements and subjective MOS values.Only the M 3D measurements for the QP = 30 and QP = 35 compression ratios at the CIF spatial resolution are equal.Here, the M 3D measurement value for the QP = 35 compression ratio is expected to be low, but not lower than the M 3D measurement value for the QP = 40 compression ratio, or the M 3D measurement value for the QP = 30 compression ratio is expected to be high, but not higher than the M 3D measurement value for the QP = 25 compression ratio.These expectations of the M 3D measurements are true for all of the QP-related results, and this equality arises due to the fact that there are no huge deviations in the M 3D measurement values for both compression ratios QP = 30 and QP = 35.
In Table 8, the quality measurements for the video "Newspaper" are given.The M 3D measurements taken at the SD spatial resolution have a similar variation to other objective video-quality measurements and especially the MOS values.But the M 3D measurements taken at the CIF spatial resolution for QP = 25, QP = 30, and QP = 35 are equal.Also, some deviations are observed in the M 3D and SSIM measurements at the QCIF spatial resolution.These equalities in the CIF spatial resolution and deviations in the QCIF spatial resolution result from compression and downsampling processes for this video.However, these results look insignificant considering the number precision.
Table 9 demonstrates the quality measurements for the video "Windmill".According to Table 9, only the M 3D measurements taken at the SD and CIF spatial resolution for the QP = 25 and QP = 30 compression ratios show insignificant deviations that are not possible to be perceived by the HVS.Also, some insignificant deviations are observed at the QCIF spatial resolution.
Table 10, which gives the quality measurements of the video "Advertisement", shows that the M 3D measurements are not compatible with other objective video-quality measurements and the MOS values.The M 3D measurements have huge deviations at all spatial resolutions for all compression ratios.But the video "Advertisement" is a CGI-based video, so the deviations most likely arise from the rendering method.The NSI-based video-quality-evolution metrics do not give accurate results in the quality measurements of the CGI-based videos.
The quality measurements of the video "Butterfly" are given in Table 11.According to this table, only the M 3D measurements taken at an SD spatial resolution for the QP = 25 and QP = 30 compression ratios show deviation.This issue is most likely caused by errors in the compression process.The rest of the M 3D measurements are aligned with the other objective quality measurements and the MOS values.
The measurements of the video "Chess" in Table 12 show that only the M 3D measurements taken at the QCIF spatial resolution show similar variations to other objective video-quality measurements and the MOS values.But, there are significant deviations in the SD and QCIF spatial resolutions for all the compression ratios.These deviations are most likely caused by the rendering method, which makes "Chess" a CGI-based video.And, the quality of a CGI-based video should be measured by using a CGI-based videoquality-evaluation metric.
Table 13 gives the quality measurements of the video "Farm".This table shows that only M 3D measurements taken at the SD spatial resolution are aligned with the other objective video-quality measurements and the MOS values.Although there are bias-like deviations at the CIF and QCIF spatial resolutions, these deviations are too insignificant to be perceived by the HVS.On the other hand, the VQM, PSNR, and SSIM measurements have deviations at all spatial resolutions and for all compression ratios because of the errors in the encoding, compressing, and resizing processes.
Lastly, it is observed in Table 14 showing the measurements of the video "Football" that the M 3D measurements taken at the SD spatial resolution for the QP = 25 or QP = 30 compression ratios show deviations.Also, although there is another deviation at the CIF and QCIF spatial resolutions, they are very small and thus cannot be perceived by the HVS.
As a general assessment according to the M 3D measurements, the quality estimates of the proposed metric show significant similarities with the VQM, PSNR, and SSIM measurements and especially the MOS values.Approximately 80% of the results obtained from the proposed metric vary in accordance with the MOS, VQM, PSNR, and SSIM variances.The majority of the remaining 20% show insignificant variances that cannot be noticed by the HVS.It is considered that these cases are caused by spatial and temporal distortions due to encoding, compressing, resizing, upsampling, downsampling, pixel losses, or other similar reasons in the 2D-texture videos and DM sequences of the 3D videos used.Particularly, the effects of the change in the compression ratio on DM sequences are remarkable.In addition, the artifacts observed in some DM sequences led to inaccurate calculations of the depth cues and had disruptive effects on the M 3D measurements (see Figure 5).Moreover, the 3D-video QoE evaluation-performance efficiency of the M 3D over the VQM, PSNR, and SSIM metrics can be observed from the correlation coefficient (CC) results calculated by using the MOS results.The CC results calculated by using the Pearson method and showing the relationship between the M 3D quality estimations and the MOS values are given in Table 15.The average CC results of the M 3D and the MOS are computed as 0.775 for all the 3D videos, QPs, and spatial resolutions.However, the CC results of the M 3D and the VQM, PSNR, and SSIM metrics are computed as 0.784, 0.772, and 0.838, respectively.From this point on, we will take a deeper look at Table 15.For the videos "Breakdance", "Ballet", "Interview", "Football", and "Butterfly", the M 3D measurements have high correlation coefficients with the objective VQM, PSNR, and SSIM metrics and subjective MOS measurements.This means that there are strong linear relationships between the M 3D measurements and the other video-quality measurements used.
The lowest correlation coefficients between the M 3D measurements and the other video-quality measurements are observed in the video "Advertisement".The correlation coefficients of the video "Advertisement" are generally below the value 0.3 so that there are weak linear relationships between the M 3D measurements and other video-quality measurements.This also means that an increase in any video-quality measurement does not mean a higher M 3D measurement and vice versa.
For the videos "Farm" and "Chess", half of the CC results are between 0.3 and 0.7, and the remaining half are above 0.7.As the CC results between 0.3 and 0.7 (half) indicate moderate linear relationships between the M 3D measurements and the VQM, PSNR, SSIM, and MOS measurements, the CC results above 0.7 indicate strong linear relationships between the M 3D measurements and the VQM, PSNR, SSIM, and MOS measurements.
For the videos "Windmill" and "Newspaper", the CC results of the QCIF versions are generally below 0.3 so that there are weak linear relationships between the M 3D measurements and the VQM, PSNR, SSIM, and MOS measurements.As mentioned above, this situation arises from the negative reflections on the M 3D of the errors that occur in processes such as encoding, resizing, and downsampling.The CIF and SD versions have high CC results, which mean strong linear relationships between the M 3D measurements and the VQM, PSNR, SSIM, and MOS measurements.
In light of the CC results in Table 15 and the explanations above, it is understood that there is a useful correlation between M 3D quality estimations and the measurements of the MOS, VQM, PSNR, and SSIM; also, this correlation is worth considering when developing a new hybrid 3D-video-quality metric based on spatial resolution and depth cues.

Conclusions and Future Works
Researchers use subjective tests in general to evaluate the quality of 3D videos.However, subjective tests have significant disadvantages such as a high cost, being time consuming, and its unsuitability for real-time applications.For this reason, there is a great need for an objective and hybrid 3D-video QoE evaluation metric that is highly correlated with the HVS and has excellent alignment with the MOS.Therefore, for such a metric to be developed, it is a must to consider the effects of depth cues and spatial resolution, which directly affect the viewer's depth perception.
In this study, a hybrid 3D-video QoE evaluation metric was developed that employs the effects of spatial-resolution-associated blurriness, motion information, retinal-image size, convergence, and parameters on the depth perception of the viewers to be used in the quality evaluation of 3D videos obtained by using the 2D + DM method, which may be a preferred method by the researchers.Blurriness and motion information were derived from the 2D color-texture video while the retinal-image size and convergence are derived from the DM.Also, spatial resolution is derived from both the color-texture video and the DM.
This study emphasizes the critical role of the depth cues associated with spatial resolution in designing an effective 3D-video QoE metric.The results show that the proposed hybrid metric is quite successful and can be utilized to predict the 3D-video QoE.Obtaining successful results from the proposed metric proves that it is an appropriate approach to use depth cues and spatial resolution together as input parameters while developing a 3D-video QoE evaluation metric.Especially, a high correlation with the HVS also proves the validity of the proposed metric's estimations.The proposed metric will allow researchers to avoid the high cost of subjective tests and save time.Also, it is feasible to use the proposed metric in real-time applications as it is a hybrid metric.For these reasons, it will accelerate the studies on 3D-video technologies and encourage future studies.
It has to be noted that the predicted MOS values are eligible to be enhanced.In future work, it is possible to fine tune the formulas by optimizing the coefficients, developing different models for measuring the depth cues, changing existing depth cues with other depth cues, and/or adding extra depth-cue elements to the proposed metric to further improve the results.

Figure 1 .
Figure 1.The framework of the proposed 3D-video QoE evolution metric.

Figure 1 .
Figure 1.The framework of the proposed 3D-video QoE evolution metric.

Figure 3 .
Figure 3. Change in depth values in Breakdance DM sequence.

g 2023, 9 ,
x FOR PEER REVIEW 16 of 29

Figure 4 .
Figure 4.The geometry of convergence.The green and orange circles represent two objects having different sizes.The grey circles represent left and right eyes.
(i.e., SD, CIF, and QCIF) encoded with 25, 30, 35, 40, and 45 Quantization Parameters (QPs).An autostereoscopic display of 23′ is utilized to present 2D + DM-form-based 3D videos during the experiments.Before the subjective tests, the participants are sufficiently informed about the features of the test and scoring.The scores given by the observers range from one to five.A five indicates that perception is at the highest level, and a one indicates that it is at the lowest level.The observers are not informed about the order, coding parameters, and features of the test videos.The observers participating in the tests do not have expertise in 3D videos.The observers participated in the test sitting 3 m away from the autostereoscopic screen.The tests are always carried out in the same test environment.To create the 3D videos, the same sized and encoded DM sequences and 2D-texture videos are used.

Figure 4 .
Figure 4.The geometry of convergence.The green and orange circles represent two objects having different sizes.The grey circles represent left and right eyes.

Figure 5 .
Figure 5.Some artifacts in the DMs of (a) Newspaper, (b) Breakdance, (c) Chess, and (d) Farm.The red rectangles/squares highlight some remarkable artifacts on the DM sequences.

Figure 5 .
Figure 5.Some artifacts in the DMs of (a) Newspaper, (b) Breakdance, (c) Chess, and (d) Farm.The red rectangles/squares highlight some remarkable artifacts on the DM sequences.

Table 1 .
Blurriness measurements per QP and spatial resolution for the 2D-video sequences.

Table 2 .
Motion-information measurements per QP and spatial resolution for the 2D-video sequences.

Table 3 .
Retinal-image-size measurements per QP and spatial resolution for the DM sequences.

Table 4 .
Convergence measurements per QP and spatial resolution for the DM sequences.

Table 15 .
Correlation between the M3D measurements and the values of the MOS, VQM, PSNR, and SSIM.

Table 15 .
Correlation between the M 3D measurements and the values of the MOS, VQM, PSNR, and SSIM.